[CORE] Make LeafTransformSupport's getPartitions return Seq[Partition]#10838
[CORE] Make LeafTransformSupport's getPartitions return Seq[Partition]#10838zhztheplayer merged 1 commit intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Thanks. Should we also refactor GlutenWholeStageColumnarRDD? Does soft-affinity still work correctly after the change? |
For GlutenWholeStageColumnarRDD, I think these modifications below are sufficient. Do you think it needs to be modified in this PR as well? (Besieds. NativeFileScanColumnarRDD doesn't seem to be used. Should I delete it?) case class FirstZippedPartitionsPartition(
index: Int,
inputPartition: Partition,
inputColumnarRDDPartitions: Seq[Partition] = Seq.empty)
extends Partition
class GlutenWholeStageColumnarRDD(
@transient sc: SparkContext,
@transient private val inputPartitions: Seq[Partition],
do you mean getPreferredLocations? I haven't modified its logic, so it still applies e.g. in GlutenWholeStageColumnarRDD, It will do the correct cast. Or just like override protected def getPreferredLocations(split: RDDPartition): Seq[String] = {
split.asInstanceOf[FilePartition].preferredLocations()
} |
|
Run Gluten Clickhouse CI on x86 |
Yes and I saw you're working on this. Thanks.
For sure. Thanks. |
|
Run Gluten Clickhouse CI on x86 |
| }) | ||
|
|
||
| val allSplitInfos = getSplitInfosFromPartitions(isKeyGroupPartition, leafTransformers) | ||
| val allSplitInfos = leafTransformers.map(_.getSplitInfos).transpose |
There was a problem hiding this comment.
Do we need to preserve a similar comment like this? Which will help user understand the .transpose usage here.
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
@zhztheplayer I rebased and resolved the conflict, can you have a look, thanks |
What changes are proposed in this pull request?
InputPartitionis actually just a DSv2 interface API,Partitionis common.LeafTransformSupport's
getPartitionsshould returnSeq[Partition], for example:FileSourceScanExecTransformerreturnSeq[FilePartition]BatchScanExecTransformerreturnSeq[DataSourceRDDPartition]Especially after Spark proposed Storage-Partitioned Join, a
DataSourceRDDPartitionmay contain multiInputPartitions. SeeDataSourceRDD:Due to this issue, some if-else codes were introduced previously, this PR cleans them up.
How was this patch tested?